Sound Retrieval and Ranking Using Sparse Auditory Representations
نویسندگان
چکیده
To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole-zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.
منابع مشابه
Sound Ranking Using Auditory Sparse-Code Representations
The task of ranking sounds from text queries is a good test application for machine-hearing techniques, and particularly for comparison and evaluation of alternative sound representations in a large-scale setting. We have adapted a machine-vision system, “passive-aggressive model for image retrieval” (PAMIR) [2], which efficiently learns, using a ranking-based cost function, a linear mapping fr...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملLearning of sparse auditory receptive fields
It is largely unknown how the properties of the auditory system relate to the properties of natural sounds. Here, we analyze representations of simulated neurons that have optimally sparse activity in response to spectrotemporal speech data. These representations share important properties with auditory neurons as determined in electrophysiological experiments.
متن کاملAuditory-inspired sparse representation of audio signals
This article deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone/gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, tr...
متن کاملAuditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable
Sounds in our environment like voices, animal calls or musical instruments are easily recognized by human listeners. Understanding the key features underlying this robust sound recognition is an important question in auditory science. Here, we studied the recognition by human listeners of new classes of sounds: acoustic and auditory sketches, sounds that are severely impoverished but still reco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 22 9 شماره
صفحات -
تاریخ انتشار 2010